Detection of Topological Patterns in Complex Networks: 
Correlation Profile of the Internet 
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A general scheme for detecting and analyzing topological patterns in large complex networks is 
presented. In this scheme the network in question is compared with its properly randomized version 
that preserves some of its low-level topological properties. Statistically significant deviation of any 
measurable property of a network from this null model likely reflect its design principles and/or 
evolutionary history. We illustrate this basic scheme on the example of the correlation profile of 
the Internet quantifying correlations between connectivities of its neighboring nodes. This profile 
distinguishes the Internet from previously studied molecular networks with a similar scale-free con- 
nectivity distribution. We finally demonstrate that clustering in a network is very sensitive to both 
the connectivity distribution and its correlation profile and compare the clustering in the Internet 
to the appropriate null model. 
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Networks have emerged as a unifying theme in com- 
plex systems research. It is in fact no coincidence that 
networks and complexity are so heavily intertwined. Any 
future definition of a complex system should reflect the 
fact that such systems consist of many mutually interact- 
ing components. These components are not identical as 
say electrons in condensed matter physics. Instead each 
of them has a unique identity separating it from others. 
The very basic question one may ask about a complex 
system is which other components a given component 
interacts with? Systemwide this information can be vi- 
sualized as a graph whose nodes correspond to individual 
components and edges to their mutual interactions. Such 
a network can be thought of as a backbone of the com- 
plex system along which propagate various signals and 
perturbations. 

Living organisms provide us with a quintessential 
paradigm for a complex system. Therefore, it should not 
be surprising that in biology networks appear on many 
different levels: from genetic regulation and signal trans- 
duction in individual cells, to neural system of animals, 
and finally to food webs in ecosystems. However, com- 
plex networks are not limited to living systems: in fact 
they lie at the foundation of an increasing number of ar- 
tificial systems. The most prominent example of this is 
the Internet and the World Wide Web being correspond- 
ingly the "hardware" and the "software" of the network 
of communications between computers. 

An interesting common feature of many complex net- 
works is an extremely broad, often scale-free, distribu- 
tion of connectivities (defined as the number of immedi- 
ate neighbors) of their nodes [1]. While the majority of 
nodes in such networks are each connected to just a hand- 
ful of neighbors, there exist a few hub nodes that have a 
disproportionately large number of interaction partners. 



The histogram of connectivities is an example of a low- 
level topological property of a network. While it answers 
the question about how many neighbors a given node 
has, it gives no information about the identity of those 
neighbors. It is clear that most of non-trivial properties 
of networks lie in the exact way their nodes are connected 
to each other. However, such connectivity patterns are 
rather difficult to quantify and measure. By just looking 
at many large complex networks one gets the impression 
that they are wired in a rather haphazard way. One may 
wonder which topological properties of a given network 
are indeed random, and which arose due to evolution 
and/or fundamental design principles and limitations? 
Such non-random features can then be used to identify 
the network and better understand the underlying com- 
plex system. 

In this work we propose a universal recipe for how such 
information can be extracted. To this end we first con- 
struct a proper null randomized model of a given net- 
work. As was pointed out in [2], broad distributions 
of connectivities in most real complex networks indicate 
that the connectivity is an important individual char- 
acteristic of a node and as such it should be preserved 
in any meaningful randomization process. In addition to 
connectivities one may choose to preserve some other low- 
level topological properties of the network. Any higher 
level topological property, such as e.g. the pattern of 
correlations between connectivities of neighboring nodes, 
the number of loops of a certain type, the number and 
sizes of components, the diameter of the network, spec- 
tral properties of its adjacency matrix, can then be mea- 
sured in the real complex network and separately in an 
ensemble of its randomized counterparts. Dealing with 
the whole ensemble allows one to put error bars on any 
quantity measured in the randomized network. One then 
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concentrates only on those topological properties of the 
complex network that significantly deviate from the null 
model, and, therefore, are likely to reflect its basic design 
principles and/or evolutionary history. 

The local rewiring algorithm that randomizes a net- 
work yet strictly conserves connectivities of its nodes 
[3,4] consists of repeated application of the elementary 
rewiring step shown and explained in detail in Fig.l. 




FIG. 1. One elementary step of the local rewiring algo- 
rithm. A pair of edges A — B and C — D is randomly selected. 
They are then rewired in such a way that A becomes con- 
nected to D, and C - to B, provided that none of these edges 
already exist in the network, in which case the rewiring step is 
aborted, and a new pair of edges is selected. The last restric- 
tion prevents the appearance of multiple edges connecting the 
same pair of nodes. 

It is easy to see that the number of neighbors of every 
node in the network remains unchanged after an elemen- 
tary step of this randomization procedure. The directed 
network version of this algorithm separately conserves 
the number of upstream and downstream neighbors (in- 
and out-degrees) of every node. 

Another simple numerical algorithm generating such a 
random network "from scratch" was proposed in [2,5]. It 
starts with assigning to each node a number ki of "edge 
stubs" equal to its desired connectivity. A random net- 
work is then constructed by randomly picking two such 
edge stubs and joining them together to form a real edge 
connecting these two nodes. One of the limitations of 
this "stub reconnection" algorithm is that for broad dis- 
tribution of connectivities, which is usually the case in 
complex networks [1], the algorithm generates multiple 
edges joining the same pair of hub nodes. This prob- 
lem cannot be avoided by simply not allowing multiple 
edges to form during the reconnection process as in this 
case the whole algorithm would get stuck in a conhgu- 
ration in which the remaining edge stubs have no eligi- 
ble partners. Fortunately the local rewiring algorithm 
[3,4] instead of completely deconstructing a network and 
then randomly putting it back together, only gradually 
changes its wiring pattern. Hence, any topological con- 
straint such as e.g. that of no multiple edges, or no dis- 
connected components, can be maintained at each step 
of the way. 

Once an ensemble of randomized versions of a given 
complex network is generated, the abundance of any 



topological pattern is compared between the real network 
and characteristic members of this ensemble. This com- 
parison can be quantihed using two natural parameters: 
1) the ratio R{j) = N{j)/Nr{j), where N{j) is the num- 
ber of time s the pattern j is observed in the real network, 
and Nr{j) is the average number of its occurrences in the 
ensemble of its random counterparts; 2) the Z-score of 
the deviation defined as Z{j) = [N{j) - Nr{j)]/ANrU), 
where ANr{j) is the standard deviation of Nr{j) in the 
randomized ensemble. This general idea was recently ap- 
plied to protein networks in yeast [3] and E. coli [6]. 

We now illustrate our general methods using the exam- 
ple of the Internet, defined on the level of Autonomous 
Systems (AS). Autonomous Systems are large groups of 
workstations, servers, and routers usually belonging to 
one organization such as e.g. a university, or a business 
enterprise. The data on direct connections between Au- 
tonomous Systems is regularly updated and is available 
on the website of the National Laboratory for Applied 
Network Research [12]. Such coarse-grained structure 
of the Internet was a subject of several recent studies 
[7-10]. In the following analysis we use the millennium 
snapshot of the Internet (data from January 2, 2000), 
when N = 6474 Autonomous Systems were linked by 
E = 12572 bi-directional edges. 

It was recently reported [7] that the Internet is char- 
acterized by a scale-free distribution of AS connectivi- 
ties p{K) cx 1/K'>^ = l/if2 i±0-2. One can show that 
for such a scale-free network the above mentioned con- 
straint of no multiple connections between nodes is ex- 
tremely important. Indeed, the connectivity of two 
largest connected hubs in a scale-free networks scales 
as kmax ~ N^^^'^~^\ In an uncorrelated random net- 
work with no constraints on edge multiplicity the ex- 
pected number of edges connecting these two hubs scales 
as k'^ax/ (2-^) ~ A/'^/^'T"^)"^ and increases indefinitely for 
7 < 3 (here we assumed that E ^ N). For the Internet 
that corresponds to two largest hubs with connectivities 
of respectively Kq — 1458 and Ki = 750 being connected 
by a swooping ii:oii:i/(2i;) = 1458 •750/(2- 12572) = 43.5 
edges! Hence, in this case a random network ensemble 
generated by our local rewiring algorithm is very differ- 
ent from the one generated by the stub reconnection al- 
gorithm and analytically studied in [2] . 

Fig. 2 shows the average connectivity {Ki)ko of neigh- 
bors of nodes with the connectivity Kq in the real Inter- 
net (squares) as well as in a typical random network with 
no multiple connections between nodes generated by our 
local rewiring algorithm (circles). From this figure it is 
clear that most of the {Ki)ko oc Kq^'^ dependence re- 
ported in Ref. [8] is reproduced in our random ensemble 
and hence can be attributed to the effective repulsion 
between hubs due to the constraint of having no more 
than one edge directly connecting them to each other. 
In the absence of correlations between node connectivi- 
ties by definition {Ki)ko = const = {K^)/{K) [2]. This 
expression, shown as a horizontal line in Fig. 2, applies 
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only to a randomized network in which multiple edges 
are allowed. In an ensemble of random scale-free net- 
works with no multiple edges the conditional probabil- 
ity distribution P{Ki\Ko) crosses over between Ki/Kj 
for Ki <^ = 2E/Ko to power law tail for 

Ki ^ K^. This makes {Ki)ko to asymptotically scale 
as Kq~ . We have confirmed numerically that P(-ftril-K'o) 
in our randomized ensemble has a very similar shape to 
that observed in the real Internet [10]. 

From the above discussion one may get the impres- 
sion that the topology of the Internet is in perfect agree- 
ment with its randomized version. This is however 
not true. Let N{Ko, Ki) to denote the total number 
of edges connecting nodes with connectivities Kq, and 
Ki. This is an example of a higher level topologi- 
cal property of a complex network, wh ich can be com- 
pared to its typical value Nr{KQ, Ki) in the appropri- 
ate null-model network. By comparing N{Ko, Ki) and 
Nr{K(), Ki) one measures the correlation profile of the 
complex network, formed by correlations in connectiv- 
ities of neighboring nodes. In Fig. 3 we visualize the 
correlation profile of the Int ernet by plo tting the ra- 
tio R{Kq,Ki) = N{KQ,Ki)/Nr{KQ,Ki). Regions on 
the Kq — Ki plane, where R{Kq,Ki) is above (below) 
1 correspond to enhanced (suppressed) connections be- 
tween nodes with these connectivities in the Internet 
compared to its randomized counterpart. The statistical 
significance of these deviatio ns, measure d by the Z-score 
Z{Ko,Ki) = {N{Ko,Ki) ~ Nr{Ko,Ki))/ANr{Ko,Ki), 
is shown in Fig. 4 Our analysis is based on an ensemble of 
1000 randomized networks with connectivities logarith- 
mically binned into two bins per decade. In Figs. 3, 4 one 
can see several prominent features: 

• Strong suppression of edges between nodes of low 
connectivity 3 > Kq, Ki > 1. 

• Suppression of edges between nodes that both are 
of intermediate connectivity 100 > Ko,Ki > 10, 

• Strong enhancement of the number of edges con- 
necting nodes of low connectivity 3 > i^o > 1 to 
those with intermediate connectivity 100 > iV'i > 
10. 

On the other hand any pair among 5 hub nodes with 
Kq,Kx > 300 was found to be connected by an edge, 
both in the real network, and in a typical random sample. 
Hence R{Kq,K\) is close to 1 in the upper right corner 
of Fig.3. 

The strong suppression of connections between pairs 
of nodes of low connectivity can in part be attributed 
to the constraint that all AS on the Internet have to be 
connected to each other by at least one path. We have 
explicitly checked that there are indeed no isolated clus- 
ters in our data for the Internet. However, when we used 
an ensemble of random networks in which the forma- 
tion of isolated clusters was prevented at every rewiring 



step, we found very little change in the observed cor- 
relation profile. The division of all nodes on the In- 
ternet into three distinct groups of low-, intermediate-, 
and highly-connected ones visible in its correlation profile 
may be due to its hierarchical structure of, correspond- 
ingly, users, low-level (possibly regional) Internet Service 
Providers (ISP), and high-level (global) ISP. Similar hi- 
erarchical picture was recently suggested in Ref. [11] on 
the basis of the traccroutc data. 

It is worthwhile to note that the correlation profile of 
the Internet measured in this work makes it qualitatively 
different from yeast protein networks analyzed by us ear- 
lier [3]. Those molecular networks are characterized by 
suppressed connections between nodes of very high con- 
nectivity, and increased number of links between nodes 
of intermediate connectivity. Thus correlation profile al- 
lows one to differentiate between otherwise very similar 
scale- free networks in various complex systems. 

The correlation profile is by no means the only topolog- 
ical pattern one can investigate in a given complex net- 
work, with other examples being its spectral dimension 
[13], the betweenness of its edges and nodes [14,8], feed- 
back, feed-forward loops, and other small network motifs 
[6] . In the rest of this paper we analyze the level of clus- 
tering [15] of the Internet, quantified by its number of 
loops of length 3 (triangles). The real Internet contains 
6584 such loops, while its random counterparts, gener- 
ated by our local rewiring algorithm, have 8636 ± 224 
triangles (this and all future results were measured in an 
ensemble of 100 randomized networks.) Thus the clus- 
tering of the real Internet is some 9 standard deviations 
helow its value in a randomized network! This result is 
surprising because there are good reasons for the Inter- 
net to have above average level of clustering. Indeed, one 
expects its nodes to preferentially link according to their 
geographical location [8,9], general type of business or 
academic enterprises they represent, etc. All these fac- 
tors usually tend to increase clustering [15]. On the other 
hand, the correlation profile of the Internet visualized in 
Fig.3 naturally leads to the reduction in clustering. In- 
deed, the suppression of connections between nodes of 
intermediate connectivity in favor of nodes of low con- 
nectivity should reduce the number of triangles in the 
network. 

In order to explore the interplay between the level of 
clustering in the network and its correlation profile we 
studied two "extremal" random networks with the same 
connectivities of nodes as the real Internet. The first net- 
work contained no triangles, while the second one had a 
swooping 59144 triangles. Both networks were generated 
using a simple modification of our basic local rewiring 
algorithm in which a rewiring step was accepted only if 
it did not increase (in the first case) or decrease (in the 
second case) the number of triangles in the network. In 
the first case after some transient time all triangles have 
disappeared from the network, at which point we mea- 
sured its correlation profile (Fig. 5). In the second case 
our algorithm was designed to generate a network with 
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the largest possible number of triangles . Computer time 
limitations have forced us to stop the program when we 
reached 59144 triangles, which as will be shown later is 
rather close to the absolute maximum of 63844 triangles 
for a given set of node connectivities. The correlation 
profile of this very clustered network is shown in Fig. 6. 
From Fig. 5 one concludes that the correlation profile in 
which connections between hubs are suppressed in favor 
of connections between hubs and nodes of low connectiv- 
ity favors a reduced number of triangles. If instead nodes 
with similar connectivities (including hubs) prefer to con- 
nect to each other (the light-colored area on or around 
the diagonal in Fig. 6) the number of triangles is typically 
increased. This in fact can be also demonstrated ana- 
lytically. Consider an edge connecting a pair of nodes 
with connectivities Kq and Ki. The maximal number 
of triangles containing this edge is min(ifo — l,Ki — 1). 
Indeed, in the best case scenario all if — 1 remaining 
neighbors of the smaller connectivity node are also neigh- 
bors of the larger connectivity node. Therefore, given a 
correlation profile specified by N{Ko, Ki) - the number 
of edges connecting nodes with connectivities Kq,Ki - 
the absolute maximum number of triangles in the net- 
work is given by N'^^^ = T.k„k, N{Ko,Ki)mm{Ko - 
1 , /fi — 1)/6. Here the factor 1 /6 corrects for the fact that 
in our counting scheme each triangle would be counted 
2 times along each of its three sides. Using identities 
mm{Ko-l.Ki-l) = {Ko-l + Ki-l)/2-\Ko-Ki\ and 
Eko.x, N{K,,K,){K, - 1) = Eko.k. NiKo,K,){K, - 
1) = N{K{K - 1)) one finally gets: 

^ N{K{K-l)) _ 
^ 6 

-\Y. N{Ko,K,)\Ko-K,\. (1) 

The first part of this expression corresponds to a hypo- 
thetical situation of the maximal cliquishness in which all 
neighbors of every node are connected to each other. It is 
easy to see that except for some very special cases of the 
distribution of connectivities such maximal cliquishness 
can never be realized. Indeed, whenever a pair of nodes 
of unequal connectivities Ko,Ki are connected to each 
other the second term in the Eq. 1 decreases the maxi- 
mal number of triangles. Given the set of node connec- 
tivities Ki , one can easily construct the network with the 
largest possible number of triangles. One starts by con- 
necting the largest hub node to other nodes in the order 
of decreasing connectivities. In the second round of this 
algorithm one selects the remaining neighbors of the sec- 
ond largest hub in the order of decreasing connectivity. 
The process continues round by round until neighbors 
of all nodes are specified. When a node reaches its de- 
sired connectivity it will be simply skipped during later 
rounds of this algorithm. One can show that the net- 
work generated by this algorithm has the smallest value 
of J2ko Ki ^{^o,Ki)\Ko — Ki\ and the largest number 
of triangles among all networks with a given set of node 



connectivities. In case of the Internet such network has 
63, 884 triangles just below the N"^^"" = 64, 702 speci- 
fied by its correlation profile. These numbers of trian- 
gles are an order of magnitude below the naive estimate 
N{K{K - l))/6 ~ 690,000 traditionally used as a nor- 
malization factor in the formula for the clustering coeffi- 
cient of a network [15]. Hence, based on their definition 
even the loopiest network with the same node connectiv- 
ities as the Internet has a clustering coefficient of only 
0.09! For the "native" correlation profile of the Internet 
Eq. 1 predicts the maximal number of triangles close to 
24, 000, which sets the observed level of clustering (6584 
triangles) around 27% of its maximal value for this cor- 
relation profile. 

In order to check if connectivity correlations visible in 
the correlation profile of the internet (Fig. 3) can fully 
account for its number of triangles we generated an en- 
semble of random networks that preserves not only con- 
nectivities but also the correlation profile of the complex 
network. To this end we used a modification of our main 
local rewiring algorithm. There are two principal ways 
in which this can be done. In the first scheme, reminis- 
cent of generating a microcanonical ensemble in statisti- 
cal physics, one allows only for those local rewiring steps 
that strictly conserve the number of edges N{Ko, Ki) be- 
tween nodes with connectivities Ko,Ki. This is achieved 
by constraining the selection of pairs of edges for the 
rewiring step of Fig.l only to those connecting nodes 
with connectivities Ko,Ki, and Ko,K[. It is easy to 
see that such a local rewiring step strictly conserves 
N{Ko,Ki). In practice we softened randomization con- 
straints by coarse-graining the logarithm of connectiv- 
ity to half-decade bins. Using this "microcanonical al- 
gorithm" we generated an ensemble of networks with 
4132 ± 75 loops. The fact that the number of loops in 
the real Internet (6584) is now significantly larger than 
in these random networks, confirms the intuitive notion 
that the Internet is indeed characterized by a significant 
degree of clustering. We have also found that this 60% 
increase in the level of clustering is equally spread over 
the whole spectrum of connectivities. 

As is always the case with microcanonical algorithms 
one should worry if the above algorithm is ergodic. In 
other words there is no guarantee that in this algorithm 
the system does not get trapped in a disconnected com- 
ponent of the phase space. This is easily checked by 
annealing the network using a canonical Metropolis algo- 
rithm [16] with an energy function or Hamiltonian, which 
in our case can be defined as H = J2ko Ki Wi^o, Ki) — 
Nr{Ko, Ki)]^ /N{Ko, Ki), and sampling networks at a fi- 
nite temperature T. Local moves lowering the Hamilto- 
nian are always accepted, while those increasing it by AH 
are only accepted with the probabihty exp(— Ai//T). As 
seen in Fig. 7 the above algorithm nicely extrapolates be- 
tween the microcanonical algorithm for small T and the 
unrestricted local rewiring algorithm for large T. This 
confirms that our microcanonical algorithm is indeed er- 
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godic. 

Another conceivable use of the Metropohs algorithm 
described above is to generate an artificial network with 
a given distribution of connectivities p{K) and a given 
correlation profile R{Kq,K{). To achieve this one first 
generates a seed network with a given p{K), e.g. by the 
stub reconnecting algorithm of Rcf. [5,2]. This network is 
first annealed using the Metropolis algorithm with the en- 
ergy functional punishing multiple connections between 
nodes. The resulting network, containing no multiple 
connections is subsequently annealed with another en- 
ergy functional favoring the desired correlation profile. 
This results in an ensemble of random networks with no 
multiple connections between nodes and the desired cor- 
relation profile. 

In summary we have proposed a general algorithm to 
detect characteristic topological features in a given com- 
plex network. In particular, we introduced the concept of 
the correlation profile, which allowed us to quantify dif- 
ferences between different complex networks even when 
their connectivity distributions are similar to each other. 
Applied to the Internet, this profile identifies hierarchi- 
cal features of its structure, and helps to account for the 
level of clustering in this network. 

Work at Brookhaven National Laboratory was carried 
out under Contract No. DE-AC02-98CH10886, Division 
of Material Science, U.S. Department of Energy. 
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FIG. 2. The average connectivity {Ki)ko of neighbors of 
nodes with connectivity Ky, in the Internet (squares) and 
its typical randomized counterpart (circles). Error bars in 
multiple realizations of the randomized network are smaller 
than symbol sizes. The horizontal line is the analytical re- 
sult {K\)ko = const = {K^)/{K) ~ 165 valid for a random 
network in which multiple edges between pairs of nodes are 
allowed [2]. 
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FIG. 3. 

R{Ko,Ki) 



Correlation profile of the Internet. The ratio 

where N{Ko,Ki) is 



N{Ko,Ki)/Nr{Ko,Ki) 
the total number of edges in the Internet eonnecting pairs of 
Autonomous Systems with connectivities Kq and Ki, while 
Nr{Ko, Ki) is the same quantity in the ensemble of random- 
ized versions of the Internet, generated by the local rewiring 
algorithm described in the text. 
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FIG. 4. Statistical significance of correlations in the In- 
ternet. The Z-score of correlation patterns in the internet 
Z{Ko,Ki) = {N{Ko,Ki)-Nr {Ko ,Ki))/ANr{Ko,Ki). Here 
ANr{Ko,Ki) is the standard deviation of Nr{Ko,Ki) mea- 
sured in an ensemble of 1000 randomized networks. 
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FIG. 6. The correlation profile R{Kq,K\) of a network 
with the same set of connectivities as the Internet but with 
a very large number triangles (59144). Note the tendency of 
nodes with similar connectivities to connect to each other. 




FIG. 7. The number of loops as a function of temperature 
observed in an ensemble of random versions of the Internet 
generated by the MetropoUs algorithm with the energy func- 
tion H = Y:,ji^^j,^[N{Ko,Ki) - Nr{Ko,Ki)]^/N{Ko,Ki). 
Upper and lower triangles represent the standard deviation 
within an ensemble. 



FIG. 5. The correlation profile R{Ko,Ki) of a network 
with the same set of connectivities as the Internet but with no 
triangles. Note the suppression of connections between differ- 
ent hubs in favor of connections between hubs and nodes of 
low connectivity. 
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